8 research outputs found

    Nonparametric enrichment in computational and biological representations of distributions

    Get PDF
    This thesis proposes nonparametric techniques to enhance unsupervised learning methods in computational or biological contexts. Representations of intractable distributions and their relevant statistics are enhanced by nonparametric components trained to handle challenging estimation problems. The first part introduces a generic algorithm for learning generative latent variable models. In contrast to traditional variational learning, no representation for the intractable posterior distributions are computed, making it agnostic to the model structure and the support of latent variables. Kernel ridge regression is used to consistently estimate the gradient for learning. In many unsupervised tasks, this approach outperforms advanced alternatives based on the expectation-maximisation algorithm and variational approximate inference. In the second part, I train a model of data known as the kernel exponential family density. The kernel, used to describe smooth functions, is augmented by a parametric component trained using an efficient meta-learning procedure; meta-learning prevents overfitting as would occur using conventional routines. After training, the contours of the kernel become adaptive to the local geometry of the underlying density. Compared to maximum-likelihood learning, our method better captures the shape of the density, which is the desired quantity in many downstream applications. The final part sees how nonparametric ideas contribute to understanding uncertainty computation in the brain. First, I show that neural networks can learn to represent uncertainty using the distributed distributional code (DDC), a representation similar to the nonparametric kernel mean embedding. I then derive several DDC-based message-passing algorithms, including computations of filtering and real-time smoothing. The latter is a common neural computation embodied in many postdictive phenomena of perception in multiple modalities. The main idea behind these algorithms is least-squares regression, where the training data are simulated from an internal model. The internal model can be concurrently updated to follow the statistics in sensory stimuli, enabling adaptive inference

    Language Modeling Is Compression

    Full text link
    It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively. Finally, we show that the prediction-compression equivalence allows us to use any compressor (like gzip) to build a conditional generative model

    Neural Networks and the Chomsky Hierarchy

    Full text link
    Reliable generalization lies at the heart of safe ML and AI. However, understanding when and how neural networks generalize remains one of the most important unsolved problems in the field. In this work, we conduct an extensive empirical study (2200 models, 16 tasks) to investigate whether insights from the theory of computation can predict the limits of neural network generalization in practice. We demonstrate that grouping tasks according to the Chomsky hierarchy allows us to forecast whether certain architectures will be able to generalize to out-of-distribution inputs. This includes negative results where even extensive amounts of data and training time never led to any non-trivial generalization, despite models having sufficient capacity to perfectly fit the training data. Our results show that, for our subset of tasks, RNNs and Transformers fail to generalize on non-regular tasks, LSTMs can solve regular and counter-language tasks, and only networks augmented with structured memory (such as a stack or memory tape) can successfully generalize on context-free and context-sensitive tasks

    Strange Hadron Spectroscopy with Secondary KL Beam in Hall D

    No full text
    Final version of the KLF Proposal [C12-19-001] approved by JLab PAC48. The intermediate version of the proposal was posted in arXiv:1707.05284 [hep-ex]. 103 pages, 52 figures, 8 tables, 324 references. Several typos were fixedWe propose to create a secondary beam of neutral kaons in Hall D at Jefferson Lab to be used with the GlueX experimental setup for strange hadron spectroscopy. The superior CEBAF electron beam will enable a flux on the order of 1×104 KL/sec1\times 10^4~K_L/sec, which exceeds the flux of that previously attained at SLAC by three orders of magnitude. The use of a deuteron target will provide first measurements ever with neutral kaons on neutrons. The experiment will measure both differential cross sections and self-analyzed polarizations of the produced Λ\Lambda, Σ\Sigma, Ξ\Xi, and Ω\Omega hyperons using the GlueX detector at the Jefferson Lab Hall D. The measurements will span CM cosθ\cos\theta from 0.95-0.95 to 0.95 in the range W = 1490 MeV to 2500 MeV. The new data will significantly constrain the partial wave analyses and reduce model-dependent uncertainties in the extraction of the properties and pole positions of the strange hyperon resonances, and establish the orbitally excited multiplets in the spectra of the Ξ\Xi and Ω\Omega hyperons. Comparison with the corresponding multiplets in the spectra of the charm and bottom hyperons will provide insight into he accuracy of QCD-based calculations over a large range of masses. The proposed facility will have a defining impact in the strange meson sector through measurements of the final state KπK\pi system up to 2 GeV invariant mass. This will allow the determination of pole positions and widths of all relevant K(Kπ)K^\ast(K\pi) SS-,PP-,DD-,FF-, and GG-wave resonances, settle the question of the existence or nonexistence of scalar meson κ/K0(700)\kappa/K_0^\ast(700) and improve the constrains on their pole parameters. Subsequently improving our knowledge of the low-lying scalar nonet in general

    Guidelines for the use and interpretation of assays for monitoring autophagy (4th edition)

    No full text
    In 2008, we published the first set of guidelines for standardizing research in autophagy. Since then, this topic has received increasing attention, and many scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Thus, it is important to formulate on a regular basis updated guidelines for monitoring autophagy in different organisms. Despite numerous reviews, there continues to be confusion regarding acceptable methods to evaluate autophagy, especially in multicellular eukaryotes. Here, we present a set of guidelines for investigators to select and interpret methods to examine autophagy and related processes, and for reviewers to provide realistic and reasonable critiques of reports that are focused on these processes. These guidelines are not meant to be a dogmatic set of rules, because the appropriateness of any assay largely depends on the question being asked and the system being used. Moreover, no individual assay is perfect for every situation, calling for the use of multiple techniques to properly monitor autophagy in each experimental setting. Finally, several core components of the autophagy machinery have been implicated in distinct autophagic processes (canonical and noncanonical autophagy), implying that genetic approaches to block autophagy should rely on targeting two or more autophagy-related genes that ideally participate in distinct steps of the pathway. Along similar lines, because multiple proteins involved in autophagy also regulate other cellular pathways including apoptosis, not all of them can be used as a specific marker for bona fide autophagic responses. Here, we critically discuss current methods of assessing autophagy and the information they can, or cannot, provide. Our ultimate goal is to encourage intellectual and technical innovation in the field
    corecore